802 research outputs found

    Learning with Kernels

    Get PDF

    Classifying LEP Data with Support Vector Algorithms

    Get PDF
    We have studied the application of different classification algorithms in the analysis of simulated high energy physics data. Whereas Neural Network algorithms have become a standard tool for data analysis, the performance of other classifiers such as Support Vector Machines has not yet been tested in this environment. We chose two different problems to compare the performance of a Support Vector Machine and a Neural Net trained with back-propagation: tagging events of the type e+e- -> ccbar and the identification of muons produced in multihadronic e+e- annihilation events.Comment: 7 pages, 4 figures, submitted to proceedings of AIHENP99, Crete, April 199

    Synthesis and Characterization of Copolymers of Lantanide Complexes with Styrene

    Get PDF
    Сopolymers of 2-methyl-5-phenylpentene-1-dione-3,5 with styrene in ratio 5:95, which containing Eu, Yb and Eu, Yb with 1,10-phenanthroline were synthesized at the first time. The luminescence spectra of obtained metal complexes and copolymers in solutions, films and solid state are investigated and analyzed. The solubilization of β-diketonate complexes with phenanthroline was shown to change luminescence intensity in such complexes. Obtained copolymers can be used as potential materials for organic light-emitting devices

    A Kernel Method for the Two-sample Problem

    Get PDF
    We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg.~a Banach space). We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests

    Local Ranking Problem on the BrowseGraph

    Full text link
    The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a web-server has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8

    On landmark selection and sampling in high-dimensional data analysis

    Full text link
    In recent years, the spectral analysis of appropriately defined kernel matrices has emerged as a principled way to extract the low-dimensional structure often prevalent in high-dimensional data. Here we provide an introduction to spectral methods for linear and nonlinear dimension reduction, emphasizing ways to overcome the computational limitations currently faced by practitioners with massive datasets. In particular, a data subsampling or landmark selection process is often employed to construct a kernel based on partial information, followed by an approximate spectral analysis termed the Nystrom extension. We provide a quantitative framework to analyse this procedure, and use it to demonstrate algorithmic performance bounds on a range of practical approaches designed to optimize the landmark selection process. We compare the practical implications of these bounds by way of real-world examples drawn from the field of computer vision, whereby low-dimensional manifold structure is shown to emerge from high-dimensional video data streams.Comment: 18 pages, 6 figures, submitted for publicatio

    A framework for space-efficient string kernels

    Full text link
    String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the kk-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in O(nd)O(nd) time and in o(n)o(n) bits of space in addition to the input, using just a rangeDistinct\mathtt{rangeDistinct} data structure on the Burrows-Wheeler transform of the input strings, which takes O(d)O(d) time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of kk, like the kk-mer profile and the kk-th order empirical entropy, and for calibrating the value of kk using the data

    Deep Learning for Forecasting Stock Returns in the Cross-Section

    Full text link
    Many studies have been undertaken by using machine learning techniques, including neural networks, to predict stock returns. Recently, a method known as deep learning, which achieves high performance mainly in image recognition and speech recognition, has attracted attention in the machine learning field. This paper implements deep learning to predict one-month-ahead stock returns in the cross-section in the Japanese stock market and investigates the performance of the method. Our results show that deep neural networks generally outperform shallow neural networks, and the best networks also outperform representative machine learning models. These results indicate that deep learning shows promise as a skillful machine learning method to predict stock returns in the cross-section.Comment: 12 pages, 2 figures, 8 tables, accepted at PAKDD 201

    Robust artificial neural networks and outlier detection. Technical report

    Get PDF
    Large outliers break down linear and nonlinear regression models. Robust regression methods allow one to filter out the outliers when building a model. By replacing the traditional least squares criterion with the least trimmed squares criterion, in which half of data is treated as potential outliers, one can fit accurate regression models to strongly contaminated data. High-breakdown methods have become very well established in linear regression, but have started being applied for non-linear regression only recently. In this work, we examine the problem of fitting artificial neural networks to contaminated data using least trimmed squares criterion. We introduce a penalized least trimmed squares criterion which prevents unnecessary removal of valid data. Training of ANNs leads to a challenging non-smooth global optimization problem. We compare the efficiency of several derivative-free optimization methods in solving it, and show that our approach identifies the outliers correctly when ANNs are used for nonlinear regression

    Neuropathology in COVID-19 autopsies is defined by microglial activation and lesions of the white matter with emphasis in cerebellar and brain stem areas

    Get PDF
    IntroductionThis study aimed to investigate microglial and macrophage activation in 17 patients who died in the context of a COVID-19 infection in 2020 and 2021.MethodsThrough immunohistochemical analysis, the lysosomal marker CD68 was used to detect diffuse parenchymal microglial activity, pronounced perivascular macrophage activation and macrophage clusters. COVID-19 patients were compared to control patients and grouped regarding clinical aspects. Detection of viral proteins was attempted in different regions through multiple commercially available antibodies.ResultsMicroglial and macrophage activation was most pronounced in the white matter with emphasis in brain stem and cerebellar areas. Analysis of lesion patterns yielded no correlation between disease severity and neuropathological changes. Occurrence of macrophage clusters could not be associated with a severe course of disease or preconditions but represent a more advanced stage of microglial and macrophage activation. Severe neuropathological changes in COVID-19 were comparable to severe Influenza. Hypoxic damage was not a confounder to the described neuropathology. The macrophage/microglia reaction was less pronounced in post COVID-19 patients, but detectable i.e. in the brain stem. Commercially available antibodies for detection of SARS-CoV-2 virus material in immunohistochemistry yielded no specific signal over controls.ConclusionThe presented microglial and macrophage activation might be an explanation for the long COVID syndrome
    corecore